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Abstract 



Recent genomic and bioinformatic advances 
have motivated the development of numerous 
random netw^ork models purporting to describe 
graphs of biological, technological, and socio- 
logical origin. The success of a model has been 
evaluated by how^ w^ell it reproduces a few^ key 
features of the real-w^orld data, such as degree 
distributions, mean geodesic lengths, and clus- 
tering coefficients. Often pairs of models can 
reproduce these features wiih indistinguishable 
fidelity despite being generated by vastly differ- 
ent mechanisms. In such cases, these few tar- 
get features are insufficient to distinguish vv^hich 
of the different models best describes real w^orld 
netvv^orks of interest; moreover, it is not clear 
a priori that any of the presently-existing algo- 
rithms for netw^ork generation offers a predictive 
description of the netw^orks inspiring them. To 
derive discriminative classifiers, wt construct a 
mapping from the set of all graphs to a high- 
dimensional (in principle infinite-dimensional) 
' Vord space." This map defines an input space 
for classification schemes w^hich allows us for the 



first time to state unambiguously vv^hich models 
are most descriptive of the netw^orks they pur- 
port to describe. Our training sets include net- 
v^orks generated from 17 models either drav^n 
from the literature or introduced in this v^ork, 
source code for w^hich is freely available ||T]|. 
We anticipate that this nev^ approach to netvv^ork 
analysis will be of broad impact to a number of 
communities. 



1 Introduction 



The post-genomic revolution has ushered in an 
ensemble of novel crises and opportunities in 
rethinking molecular biology. The two princi- 
pal directions in genomics, sequencing and tran- 
scriptome studies, have brought to light a num- 
ber of new questions and forced the develop- 
ment of numerous computational and mathe- 
matical tools for their resolution. The sequenc- 
ing of whole organisms, including homo sapi- 
ens, has shown that in fact there are roughly 
the same number of genes in men and in mice. 
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Moreover, much of the coding regions of the 
chromosomes (the subsequences which are di- 
rectly translated into proteins) are highly homol- 
ogous. The complexity comes then, not from a 
larger number of parts, or more complex parts, 
but rather through the complexity of their inter- 
actions and interconnections. 

Coincident with this biological revolution - the 
massive and unprecedented volume of biologi- 
cal data - has blossomed a technological revo- 
lution with the popularization and resulting ex- 
ponential growth of the Internet. Researchers 
studying the topology of the Internet [2J and 
the World Wide Web 1 3 1 attempted to summa- 
rize their topologies via statistical quantities, 
primarily the distribution P{k) over nodes of 
given connectivity or degree fc, which it was 
found, was completely unlike that of a "random" 
or Erdos-Renyi graph ^ Instead, the distribu- 
tion obeyed a power-law P{k) ^ for large 
k. This observation created a flurry of activity 
among mathematicians at the turn of the mil- 
lennium both in (i) measuring the degree dis- 
tributions of innumerable technological, soci- 
ological, and biological graphs (which gener- 
ically, it turned out, obeyed such power-law 
distributions) and (ii) proposing myriad models 
of randomly-generated graph topologies which 
mimicked these degree distributions (cf. [|6| for 
a thorough review). The success of these lat- 
ter efforts reveals a conundrum for mathemati- 
cal modeling: a metric which is universal (rather 
than discriminative) cannot be used for choos- 
ing the model which best describes a network 
of interest. The question posed is one of clas- 
sification, meaning the construction of an al- 
gorithm, based on training data from multiple 
classes, which can place data of interest within 
one of the classes with small test loss. 



4t will be a question for historians of science to pon- 
der why the Erdos-Renyi model of networks was used 
as the universal straw man, rather than the Price model 
(4113, inspired by a naturally-occurring graph (the cita- 
tion graph), which gives a power-law degree distribution. 
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Figure 1: Ambiguity in network mechanisms: 
we plot the degree distribution of two graphs 
generated using radically different algorithms. 
The red line results from an algorithm of the 
Barabasi class [3|; the blue from the "static" 
model of Kim et al 1 8 1. The distributions are in- 
distinguishable, illustrating the insufficiency of 
degree distributions as a classifying metric. 



In this paper, we present a natural mapping from 
a graph to an infinite-dimensional vector space 
using simple operations on the adjacency ma- 
trix. We then test a number of different classifi- 
cation (including density estimation) algorithms 
which prove to be effective in finding new met- 
rics for classifying real world data sets. We se- 
lected 17 different models proposed in the litera- 
ture to model various properties of naturally oc- 
curring networks. Among them are various bi- 
ologically inspired graph-generating algorithms 
which were put forward to model genetic or pro- 
tein interaction networks. To assess their value 
as models of their intended referent, we classify 
data sets for the E. coli genetic network, the C. 
elegans neural network and the yeast S. cere- 
visiae protein interaction network. We antici- 
pate that this new approach will provide a gen- 
eral tool of analysis and classification in a broad 
diversity of communities. 
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The input space used for classifying graphs was 
introduced in our earher work |9| as a technique 
for finding statistically significant features and 
subgraphs in naturally occurring biological and 
technological networks. Given the adjacency 
matrix A representing a graph (i.e., Aij = 1 
iff there exists an edge from j to i), multipli- 
cations of the matrix count the number of walks 
from one node to another (i.e., [A^]ij is the num- 
ber of unique walks from j to i in n steps). 
Note that the adjacency matrix of an undirected 
graph is symmetric. The topological structure 
of a network is characterized by the number of 
open and closed walks of given length. Those 
can be found by calculating the diagonal or non- 
diagonal components of the matrix, respectively. 
For this we define the projection operation D 
such that 

[DiA)],j = A,S,, (1) 

and its complement U = I — D. (Note that 
we do not use Einstein's summation convention. 
Indices i and j are not summed over.) We de- 
fine the primitive alphabet {A]T,U, D} as the 
adjacency matrix A and the operations T, [/, D 
with the transpose operation T(A) = A^ dis- 
tingushing walks "up" the graph from walks 
"down" the graph. From the letters of this al- 
phabet we can construct words (a series of op- 
erations) of arbitrary length. A number of re- 
dundancies and trivial cases can be eliminated 
(for example, the projection operations satisfy 
DU = UD = 0) leading to the operational al- 
phabet {A, AT, AU, AD, AUT}. The resulting 
word is a matrix representing a set of possible 
walks generated by the original graph. An ex- 
ample is shown in Figure |2l 

Each word determines two relevant statistics of 
the network: the number of distinct walks and 
the number of distinct pairs of endpoints. These 
two statistics are determined by either summing 
the entries of the matrix (sum) or counting the 
number of nonzero elements (nnz) of the ma- 
trix, respectively. Thus the two operations sum 




Figure 2: The elements of the matrix ATA 
count these two walks. TA corresponds to one 
step "up" the graph, the following A to one step 
"down". The last node could be either the same 
as the starting node as in the first subgraph (ac- 
counted for by the diagonal part DATA) or a 
different node as in the second subgraph (ac- 
counted for by the non-diagonal part U ATA). 

and nnz map words to integers. This allows 
us to plot any graph in a high-dimensional data 
space: the coordinates are the integers resulting 
from these path-based functional of the graph's 
adjacency matrix. 

The coordinates of the infinite-dimensional data 
space are given by integer- valued functional 

F(LiL2 . . . Lr,A) (2) 

where each is a letter of the operational 
alphabet and F is an operator from the set 
{sum, sumL), sumt/, nnz, nnzD, nnzt/}. We 
found it necessary only to evaluate words with 
n < 4 (counting all walks up to length 5) to con- 
struct low test-loss classifiers. Therefore, our 
word space is a 6 J2t=i ^ 4680-dimensional 
vector space, but since the words are not linearly 
independent (e.g., surnU + sumL^ = sum), the 
dimensionality of the manifold explored is ac- 
tually much smaller. However, we continue to 
use the full data space since a particular word, 
though it may be expressed as a linear combina- 
tion of other words, may be a better discrimina- 
tor than any of its summands. 

In 1 9 1, we discuss several possible interpreta- 
tions of words, motivated by algorithms for 
finding subgraphs. Previously studied metrics 
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can sometimes be interpreted in the context of 
words. For example, the transitivity of a net- 
work can be defined as 3 times the number of 3- 
cycles divided by the number of pairs of edges 
that are incident on a common vertex. For a 
loopless graph (without self-interactions), this 
can also be calculated as a simple expression in 
word space: sum{D AAA) / sum{U AA) . Note 
that this expression of transitivity as the quo- 
tient of two words implies separation in two di- 
mensions rather than in one. However, there are 
limitations to word space. For example, a simi- 
lar measure, the clustering coefficient, defined as 
the average over all vertices of the number of 3- 
cycles containing the vertex divided by the num- 
ber of paths of length two centered at that vertex, 
cannot be easily expressed in word space be- 
cause vertices must be considered individually 
to compute this quantity. Of course, the utility 
of word space is not that it encompasses previ- 
ously studied metrics, but that it can elucidate 
new metrics in an unbiased, systematic way, as 
illustrated below. 



2 Classification Methods 



2.1 SVMs 

A standard classification algorithm which has 
been used with great success in myriad fields 
is the support vector machine, or SVM ifTOll . 
This technique constructs a hyperplane in a 
high-dimensional feature space separating two 
classes from each other. Linear kernels are used 
for the analysis presented here; extensions to ap- 
propriate nonlinear kernels are possible. 

We rely on a freely available C-implementation 
of SVM-Light [11 J, which uses a working set 
selection method to solve the convex program- 



ming problem with Lagrangian 

-J m 

L{w,b) = -\w\^-C^^, (3) 

i=l 

with yi{w • + 6) > I - = 1, . . . , m 
where /(x) = w • x + 6 is the equation of 
the hyperplane, x^ are training examples and 
yi G { — 1,+1} their class labels. Here, C is 
a fixed parameter determining the trade-off be- 
tween small errors and a large margin 2/|w|. 
We set C to a default value JZT ^ x- )"^- We 
observe that training and test losses have a negli- 
gible dependence on C since most test losses are 
near or equal to zero even in low-dimensional 
projections of the data space. 



2.2 Robustness 

Our objective is to determine which of a set 
of proposed models most accurately describes a 
given real data set. After constructing a classi- 
fier enjoying low test loss, we classify our given 
real data set to find a 'best' model. However, 
the real network may lie outside of any of the 
sampled distributions of the proposed models in 
word space. In this case we interpret our clas- 
sification as a prediction of the least erroneous 
model. 

We distinguish between the two cases by not- 
ing the following: Consider building a classi- 
fier for apples and grapefruit which is then faced 
with an orange. The classifier may then decide 
that, based on the feature s i ze the orange is an 
apple. However, based on the feature taste 
the orange is classified as a grapefruit. That is, 
if we train our classifier on different subsets of 
words and always get the same prediction, the 
given real network must come closest to the pre- 
dicted class based on any given choice of fea- 
tures we might look at. We therefore define a 
robust classifier as one which consistently clas- 
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sifies a test datum in the same class, irrespec- 
tive of the subset of features chosen. And we 
measure robustness as the ratio of the number of 
consistent predictions over the total number of 
subspace-classifications. 



2.3 Generative Classifiers 

A generative model, in which one infers the dis- 
tribution from which observations are drawn, al- 
lows a quantitative measure of model assign- 
ment: the probability of observing a given word- 
value given the model. For a robust classifier, in 
which assignment is not sensitively dependent 
on the set of features chosen, the conditional 
probabilities should consistently be greatest for 
one class. 

We perform density estimations with Gaussian 
kernels for each individual word, allowing cal- 
culation of p{C = c\Xj = x), the probability of 
being assigned to class c given a particular value 
X of word j. By comparing ratios of likelihood 
values among the different models, it is there- 
fore possible, for the case of non-robust classi- 
fiers, to determine which of the features of an 
orange come closest to an apple and which fea- 
tures come closest to a grapefruit. 

We compute the estimated density at a word 
value xq from the training data (z = 1, . . . , m) 
as 

-l m 

r,ixn X) = = -^{\xi-xo\/\f /4^, 

°' ^ m(2A%)V2Z^^ 

where we optimize the smoothing parameter A 
by maximizing the probability of a hold-out set 
using 5-fold cross-validation. More precisely, 
we partition the training examples into 5 -folds 
Pi = ...A^z' where fi is the set of in- 

dices associated with fold i {i = 1 ... 5) and 



Ni = card{Fi). We then maximize 

i=i j=i 

as a function of A. In all cases we found that 
(5(A) had a well pronounced maximum as long 
as the data was not oversampled. Because words 
can only take integer values, too many training 
examples can lead to the situation that the data 
take exactly the same values with or without the 
hold-out set. In this case, maximizing Q{X) cor- 
responds to p{x, A) having single peaks around 
the integer values, so that A tends to zero. There- 
fore, we restrict the number of training examples 
to 4:Ny, where A^^ is the number of unique inte- 
ger values taken by the training set. With this re- 
striction Q(A) showed a well-pronounced maxi- 
mum at a non-zero A for all words and models. 



2.4 Word Ranking and Decision 
Trees 

The simplest scheme to find new metrics which 
can distinguish among given models is to take a 
large number of training examples for a pair of 
network models and find the optimal split be- 
tween both classes for every word separately. 
We then test every one-dimensional classifier on 
a hold-out set and rank words by lowest test loss. 
Below we show that this simple approach is al- 
ready very successful. 

Extending these results, one can ask how many 
words one needs to distinguish entire sets of dif- 
ferent models, as estimated by building a multi- 
class decision tree and measuring its test loss 
for different numbers of terminal nodes. We use 
Matlab's Statistical Toolbox with a binary multi- 
class cost function to decide the splitting at each 
node. To avoid over-fitting the data, we prune 
trained trees and select the subtree with minimal 
test loss by 10-fold cross-validation. 
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Additionally, we propose a different approach 
using decision trees to find most discriminative 
words. For every possible model pair for 
1 < ^ < J < Nmod where N^od is the total num- 
ber of models, we build a binary decision tree, 
but restricted so that at every level of each tree 
the same word has to be used for all the trees. At 
every level the best word is chosen according to 
the smallest average training loss over all binary 
trees. The model is not meant to be a substitu- 
tion to an ordinary multi-class decision tree. It 
merely represents another algorithm which may 
be useful to find a fixed number of most dis- 
criminative words, for example for visualization 
of the distributions in a three-dimensional sub- 
space. 



3 Network Models 



We sample training data for undirected graphs 
from six growth models, one scale-free static 
model L12Jill|[Il, the Small World model d, 
and the Erdos-Renyi model IITSll . Among the 
six growth models two are based on preferen- 
tial attachment 117611 11711. three on a duplication- 
mutation mechanism iTTTlllfTSll . and one on 
purely random growth |[T9ll . For directed 
graphs we similarly train on two preferen- 
tial attachment models |20|, two static models 
|pT||22J|8|, three duplication-mutation models 
||23I |24|, and the directed Erdos-Renyi model 
in31l . More detailed descriptions and source 
code are available on our website Q. 

In order to classify real data, we sample train- 
ing examples of the given models with a fixed 
total number of nodes A^o, and allow a small in- 
terval Im of 1-2% around the total number of 
edges Mo of the considered real data set. All 
additional model parameters are sampled uni- 
formly over a given range which is specifid by 
the model's creators in most cases, otherwise 



can be given reasonable bounds. Such a gener- 
ated graph is accepted if the number of edges M 
falls into the specified interval Im around Mq, 
thereby creating a distribution of graphs associ- 
ated to each model which could describe the real 
data set with given A^o and Mq. 



4 Results 



We apply our methods to three different real data 
sets: the E. coli genetic network 1 25 1 (directed), 
the S. cerevisiae protein interaction network 
1 26 1 (undirected), and the C. elegans neural net- 
work |271(directed). 

Each node in E. coli's genetic network repre- 
sents an operon coding for a putative transcrip- 
tional factor. An edge exists from operon i to 
operon j if operon i directly regulates j by bind- 
ing to its operator site. This gives a very sparse 
adjacency matrix with a total of 423 nodes and 
519 edges. 

The S. cerevisiae protein interaction network 
has 2114 nodes and 2203 undirected edges. Its 
sparseness is therefore comparable to E. coli's 
genetic network. 

The C. elegans data set represents the organ- 
ism's fully mapped neural network. Here, each 
node is a neuron and each edge between two 
nodes represents a functional, directed connec- 
tion between two neurons. The network consists 
of 306 neurons and 2359 edges, and is therefore 
about 7 times more dense than the other two net- 
works. 

We create training data for undirected or di- 
rected models according to the real data set. All 
parameters other than the numbers of nodes and 
edges were drawn from a uniform distribution 
over their range. We sampled 1000 examples 
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per model for each real data set, trained a pair- 
wise multi-class SVM on 4/5 of the sampled 
data and tested on the 1/5 hold-out set. We de- 
termine a prediction by counting votes for the 
different classes. Table [T] summarizes the main 
results. 





E. coli 


C. elegans 


S. cerevisiae 


{Ltr) 


1.6% 


0.5% 


2.1% 


{List) 


1.6% 


0.5% 


1.8% 


{Nsv) 


109 


51 


106 


Winner 


Kumar 


MZ 


Sole 


Robustness 


1.0 


.97 


0.64 



Table 1: Results of multi-class SVM. {Ltr) 
is the empirical training loss averaged over all 
pairwise classifiers, {Ltst) is the averaged em- 
pirical test loss. {Nsv) is the average number of 
support vectors. The winner is the model that 
got the highest number of votes when classify- 
ing the given real data set. 

All three classifiers show very low test loss and 
two of them a very high robustness. The average 
number of support vectors is relatively small. 
Indeed, some pairwise classifiers had as few as 
three support vectors and more than half of them 
had zero test loss. All of this suggests the exis- 
tence of a small subset of words which can dis- 
tinguish among most of these models. 

The predicted models Kumar, MZ, and Sole 
are based on very similar mechanisms of du- 
plication and mutation. The model by Kumar 
et al was originally meant to explain various 
properties of the WWW. It is based on a du- 
plication mechanism, where at every time step 
a prototype for the newly introduced node is 
chosen at random, and connected to the pro- 
totype's neighbors or other randomly chosen 
nodes with probability p. It is therefore built on 
an imperfect copying mechanism which can also 
be interpreted as duplication-mutation, often 
evoked when considering genetic and protein- 
interaction networks. Sole is based on the same 



idea, but allows two free parameters, a prob- 
ability controlling the number of edges copied 
and a probability controlling the number of ran- 
dom edges created. MZ is essentially a di- 
rected version of Sole. Moreover, we observe 
that none of the preferential attachment models 
came close to being a predicted model for one 
of our biological networks even though they, and 
other preferential attachment models in the liter- 
ature, were created to explain power-law degree 
distributions. The duplication-mutation scheme 
arises as the more successful one. 

Kumar and MZ were classified with almost per- 
fect robustness against 500-dimensional sub- 
space sampling. With 26 different choices of 
subspaces, E. coli was always classified as Ku- 
mar. We therefore assess with high confidence 
that Kumar and MZ come closest to model- 
ing E. coli and C. elegans, respectively. In 
the case of Sole and the S. cerevisiae pro- 
tein network we observed fluctuations in the 
assignment to the best model. 3 out of 22 
times S. cerevisiae was classified as Vazquez 
(duplication-mutation) , other times as Barabasi 
(preferential attachment), Klemm (duplication- 
mutation), Kim (scale-free static) or Flammini 
(duplication-mutation) depending on the subset 
of words chosen. This clearly indicates that dif- 
ferent features support different models. There- 
fore the confidence in classifying S. cerevisiae 
to be Sole is limited. 

The preference of individual words for individ- 
ual models is investigated using kernel density 
estimation 12.31 by finding words which maxi- 
mize Pz(xo)/pj(xo) for two different models {i 
and j) at a word value of the real data set xq. 
Figure 131 shows the sampled distribution and es- 
timated density for the word which extremely 
disfavors the winning model over its follower. 
The opposite case is shown in [31 for E. coli, 
where the word supports the winning model and 
disfavors its follower. More specifically we are 
able to verify that most of the words of E. coli 
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2 4 6 8 10 12 14 16 

nnz D(AUTAUTAUAUTA) 



Figure 3: Kernel Density Estimation of 
nnz D{AUTAUTAUAUTA) for two 
models of E. coli (Kumar and Krapivsky- 
Bianconi). Log-Likelihoods: log{pkumar) = 

-4:.22,log{pkrap-bianc) = -12.0. 




50 100 150 200 250 300 

nnz D(AUADATAUA) 



Figure 4: Kernel Density Estimation of 
nnz D{AUADATAUA) for two top- scor- 
ing models of C. elegans (Middendorf-Ziv and 
Grindrod). Log-Likelihoods: log(Pmz) = 
-376,log{pgrind) = -6.23. 



are most likely to be generated by Kumar. In- 
deed, out of 1897 words taking at least 2 inte- 
ger values for all of the models (density estima- 
tion for a single value is not meaningful), the 
estimated density at the E. coli word value was 
highest for Kumar in 1297 cases, for Krapivsky- 
Bianconi in 535 cases and for Krapivsky in only 
65 cases. 

Figure |3l shows the distributions for the word 
nnzDAUTAUTAUAUTA which had a maxi- 
mum ratio of probability density of Kumar over 
the one of Krapivsky-Bianconi at the E. coli po- 
sition. E. coli in fact has a zero word count 
meaning that none of the associated subgraphs 
shown in Figure |5l actually occur in E. coli. Four 
of those subgraphs have a mutual edge which is 
absent in the E. coli network and also impossi- 
ble to generate in a Kumar graph. Krapivsky- 
Bianconi graphs allow for mutual edges which 
could be one of the reasons for a higher count 
in this word. Another source might be that 
the fifth subgraph showing a higher order feed- 
forward loop is more probable to be generated 
in a Krapivsky-Bianconi graph than in a Kumar 



graph. This subgraph also has to be absent in the 
E. coli network since it gives a zero word value, 
showing that the Kumar and the Krapivsky- 
Bianconi models have both a tendence to give 
rise to a topological structure that does not ex- 
ist in E. coli. This analysis gives an example of 
how these findings are useful in refining network 
models and in deepening our understanding of 
real networks. For further discussions refer to 
our website Q 

The SVM results suggest that one may only 
need a small subset of words to be able to 
separate most of the models with almost zero 
test loss. The simplest approach to find such 
a subset is to look at every word for a given 
pair of models and compute the best split, 
then ranking words by lowest training loss. 
We find that among the most discriminative 
words some occur very often such as nnz74A 
or nnzATA, which count the pairs of edges at- 
tached to the same vertex and either pointing in 
the same direction or pointing away from each 
other, respectively. Other frequent words in- 
clude nnzDAA, nnzDATA and sumUATA. 
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Figure 5: Subgraphs associated with the word 
nnzDAUTAUTAUAUTA. The word has a 
non-zero value iff at least one of these subgraphs 
occurs in the network 

A striking feature of this single-word analysis 
is that the test loss associated with simple one- 
dimensional classifiers are comparable to the 
SVM test loss confirming that most pairs of 
models are separable with only a few words. To 
consider all of the models at once and not just in 
pairs we apply both tree algorithms described in 
12.41 to all three data sets. Figures |6l and |71 show 
scatter-plots of the training data using the most 
discriminative three words. Taking those three 
words the average training-loss over all pairs of 
models is 1.7%, 0.8% and 0.2% for the E. coh, 
C. elegans and S. cerevisiae training data, re- 
spectively. 



5 Conclusions 

It is not surprising that models with different 
mechanisms are distinguishable; however, the 
fact that these models have not been separated in 
a systematic manner to date points to the inade- 
quacy of current metrics popular in the network 
theory community. We have shown that a sys- 
tematic enumeration of countably infinite fea- 
tures of graphs can be successfully used to find 
new metrics which are highly efficient in sep- 




Figure 6: E. coli and seven directed models. The 
distributions in word space are shown for a pro- 
jection onto the subspace of the three most dis- 
criminatve words. Subgraphs associated with 
every word are also shown. 

arating various kinds of models. Furthermore, 
they allow us to define a high-dimensional input 
space for classification algorithms which for the 
first time are able to decide which of a given set 
of models most accurately describes three exem- 
plary biological networks. 
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Supplementary tables 



Name 


Fundamental Mechanism 


References 


U IdllV^VJlll 


Growth model with a probability ot attaching to an 
existing node p ^ rnki, where rn is a fitness param- 
eter. Here we use a random fitness landscape, where 
77 is drawn from a uniform distribution in (0, 1) 




Callaway 


Growth model adding one node and several edges 
between randomly chosen existing nodes (not neces- 
sarily the newly introduced one) at every time step. 




Kim 


A static model giving rise to a scale-free net- 
work. Edges are created between nodes chosen with 
a probability p ^ where i is the label of the node 
and a a constant parameter in (0, 1). 


1121.11811.111311 




LJllLlllCv^LCLl IdllLlUlll g,lcipil. 


111 jii 


rlammini 


Growing graph based on duplication modeling pro- 
tein interactions. At every time step a prototype is 
chosen randomly. With probability q edges of the 
prototype are copied. With probability p an edge to 
the prototype is created. 


1121 


Klemm 


Growing graph using sets of active and inactive 
nodes to model citation networks. 


oni,t3ii 


Small World 


Interpolation between a regular lattice and a random 
graph. We replace edges in the regular lattice by 


d 


Barabasi 


Growing graph with a probability of attaching to an 
existing node p ki. ("Bianconi" with 77^ = 1 for 
all i) 





Sole 


Growing graph initialized with a 5-ring substrate. At 
every time step a new node is added and a proto- 
type is chosen at random. The prototype's edges are 
copied with a probability p. Furthermore, random 
nodes are connected to the newly introduced node 
with probability q/N, where p and q are given pa- 
rameters in (0, 1) and N is the number of total nodes 
at the considered time step. 





Table 2: Undirected Network Models, ki is the degree of the i-th node. 
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Name 


Fundamental Mechanism 


References 


Kim^ 


Directed version of "Kim". A "static" model 
giving rise to a scale-free network. Edges are 
created between nodes chosen with probabilites 
p ^ i"}^ and q ^ j^^^ where and aout are 
fixed parameters chosen in (0, 1) and is the 
label of the i-ih (j-ih) node 


ih 


Erdos 


Directed random graph. 


[151 


Grindrod 


oLdLic gidpn. jj/Ugcs die crcdLcu DCLWccn noues 
i, j with probability p = fcA'^"^', where b and A 
are fixed parameters. 




Krapivksy 


Growing graph modeling the WWW. At every 
lime siep eiiner d new euge, or d new noue 
with an edge, are created. Nodes to connect 
are chosen with probability p ^ ki^in + a and 
? ~ kj^out + b based on preferential attachment 
with fixed real-valued offsets a and h. 


Coll 


Krapivsky- 
Bianconi 


Extension ot "Krapivsky" using a random fit- 
ness landscape multiplying the probabilities for 
preferential attachment. It is the directed analog 
of "Bianconi" beine^ an extension to "Barabasi" 


(original) 


Kumar 


Growing graph based on a copying mechanism 
to model the WWW. At every time step a pro- 
totype P is chosen at random. Then for ev- 
ery edge connected to P, with probability p an 
edge between the newly introduced node and 
P's neighbor is created, and with probability 
1 — Ti\ t\w pdpp bptwppn thp npw nodp and a 
randomly chosen other node is created. 


El 


Middendorf- 
Ziv (MZ) 


Growing directed graph modeling biological 
network dynamics. A prototype is chosen at 
ranQom anQ QupiicateQ. ine prototype or pro- 
genitor node has edges pruned with probability 
(5 and edges added with probability a <C 
Based loosely on the undirected protein net- 
work model of Sole et al. [ 1 8 1 . 


original 


Vazquez 


Growth model based on a recursive 'copying' 
mechanism, continuing to 2nd nearest neigh- 
bors, 3rd nearest neighbors etc. The authors call 
it a 'random walk' mechanism. 


El 



Table 3: Directed Network Models, ki^in (ki^out) is the in-(out-)degree of the z-th node. 
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votes 


Kumar 


Krapivsky-Bianconi 


Krapivsky 


Kim 


Vazquez 


Erdos 


Grindrod 


MZ 


Kumar 


7/7 




/(x) = 1.48 


/(x) = 2.32 


/(x) = 2.80 


/(x) = 1.12 


/(x) = 3.58 


/(x) = 3.11 


/(x) 


= 1.26 








= 5.3% 


= 4.5% 


Ltst = 0.8% 


Ltst = 0.0% 


Ltst = 0.0% 


Lt,t = 0.0% 


Ltst 


= 0.0% 








Ltr = 4.4% 


Ltr = 3.2% 


= 0.7% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 


= 0.0% 








A^.^=139 




iV,^=194 


A^.^=9 


A^.^=10 


Nsv=9 


Nsv = 


:9 


Krapivsky- 


6/7 


/(x) = -1.48 




/(x) = 2.44 


/(x) = 2.49 


/(x) = 1.01 


/(x) = 2.33 


/(x) = 2.30 


/(x) 


= 1.64 


Bianconi 


























Ltst = 5.3% 




Lt,t = 32.8% 


Ltst = 0.8% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst 


= 0.0% 






Ltr = 4.4% 




Ltr = 31.3% 


Lt^ = 0.9% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 


= 0.0% 






A^.^=139 




Ar,^=1084 


A^.^=178 


iV.»,=14 


iV.»,=13 


A^.^=ll 


Nsv = 


:9 


Krapivsky 


5/7 


/(x) = -2.32 


/(x) = -2.44 




/(x) = 2.56 


/(x) = 0.95 


/(x) = 2.67 


/(x) = 2.69 


/(x) 


= 1.72 






i^tst = 4.5% 


Lt3t = 32.8% 




Ltst = 0.8% 


Ltst = 0.0% 


Lt3t = 0.0% 


Lt,t = 0.0% 


Ltst 


= 0.0% 






Ltr = 3.2% 


Ltr = 31.3% 




= 1-6% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 


= 0.0% 






A^..=122 


iV,^=1084 




iV,^=223 


iV..=12 


iV,,=13 


Nsy = l2 


Nsv = 


:9 


Kim 


4/7 


/(x) = -2.80 


/(x) = -2.49 


/(x) = -2.56 




/(x) = 0.36 


/(x) = 0.87 


/(x) = 1.53 


/(x) 


= 1.06 






Ltst = 0.8% 


Lt,t = 0.8% 


Lj,t = 0.8% 




Ltst = 0.0% 


Ltst = 9.0% 


Ltst = 3.0% 


Ltst 


= 0.0% 






Ltr = 0.7% 


Ltr = 0.9% 


Ltr = 1-6% 




Ltr = 0.0% 


Ltr = 10.5% 


Ltr = 2.9% 


Ltr = 


= 0.1% 






iV,„=194 


iV.»,=178 


iV..=223 




iV,^=47 


iV.»,=498 


iV,„=180 




:84 


Vazquez 


3/7 


/(x) = -1.12 


/(x) = -1.01 


/(x) = -0.95 


/(x) = -0.36 




/(x) = 0.60 


/(x) = 1.25 


/(x) 


= 1.23 






Lt,t = 0.0% 


Lt,t = 0.0% 


Lt,j = 0.0% 


Ltst = 0.0% 




Ltst = 0.0% 


Lt,t = 0.0% 


Ltst 


= 0.0% 






Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


= 0.0% 




Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 


= 0.0% 










Nsv=l2 


iV3^=47 




Nsv=S 


Nsv=6 


Nsv = 


:10 


Erdos 


2/7 


/(x) = -3.58 


/(x) = -2.33 


/(x) = -2.67 


/(x) = -0.87 


/(x) = -0.60 




/(x) = 1.43 


/(x) 


= 1.36 






Ltst = 0.0% 


Li,i = 0.0% 


Lt,t = 0.0% 


Ltst = 9-0% 


Ltst = 0.0% 




Ltst = 2.3% 


Ltst 


= 0.0% 






Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 10.5% 


Ltr = 0.0% 




Ltr = 2.3% 


Ltr - 


= 0.0% 






A^.^=10 


iV.^=13 


iV,^=13 


A^.^=498 


Nsv=S 




iV.^=130 


N - 

-i-^ sv- 


:7 


Grindrod 


1/7 


/(x) = -3.11 


/(x) = -2.30 


/(x) = -2.69 


/(x) = -1.53 


/(x) = -1.25 


/(x) = -1.43 




/(x) 


= 1.37 






Ltst = 0.0% 


Lt,t = 0.0% 


Lt,t = 0.0% 


Ltst = 3.0% 


Ltst = 0.0% 


Ltst = 2.3% 




Ltst 


= 0.0% 






Ltr = 0.0% 


Ltr = 0.0% 


= 0.0% 


Ltr = 2.9% 


Ltr = 0.0% 


Ltr = 2.3% 




Ltr - 


= 0.0% 








iV..=ll 


A^..=12 


A^..=180 




A^..=130 




Nsv= 


:12 


MZ 


0/7 


/(x) = -1.26 


/(x) = -1.64 


/(x) = -1.72 


/(x) = -1.06 


/(x) = -1.23 


/(x) = -1.36 


/(x) = -1.37 










= 0.0% 


= 0.0% 


Lt,t = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Lt,t = 0.0% 


Lt,t = 0.0% 










Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.1% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 










iV.^=9 


iV,^=9 


Nsv=9 


iV.^=84 


iV,^=10 


iV,^=7 


iV.^=12 







Table 4: SVM results for E. coli. /(x) = w • 'x.e.coU + Ltst is the test loss, Ltr the training loss and A^^^ the number of support vectors. 
Results are shown for SVMs trained between every pair of models, if f{x) > E. coli is classified as the row-header, if f{x) < as the 
column-header. 





Kumar 


Krapivsky-Bianconi 


Krapivksy 


Kim 


Vazquez 


Erdos 


Grindrod 


MZ 


Kumar 




sum(ATA) 
Ltst = 0.3% 

Ltr = 0.1% 


sum(ATA) 
Ltst = 0.0% 
Ltr 0.4% 


nnz(ATA) 
Ltst = 0.0% 

Ltr 0.0% 


nnz D(AATA) 
Ltst = 0.0% 

Ltr 0.0% 


nnz(ATA) 
Ltst = 0.0% 
0.0% 


nnz(ATA) 
Ltst = 0.0% 

Ltr 0.0% 


nnz(AA) 
Lt,t = 0.0% 

Ltr 0.0% 


Krapivsky- 
Bianconi 






nnz(ADATA) 

Lf^f = 27.8% 
Ltr = 26.9% 


nnz(ATA) 

= 0.0% 
= 0.0% 


nnz D(ATA) 

= 0.0% 
Lj^ = 0.0% 


nnz(ATA) 

Lf«f = 0.0% 
L^^ = 0.0% 


nnz(ATA) 
= 0.0% 

Ltr = 0.0% 


sum D(AA) 

= 0.0% 
= 0.1% 


Krapivksy 








nnz(ATA) 

L, , 0% 
Li,. = 0.0% 


nnz D(AATA) 

L/,/ = 0% 

Ltr = 0.0% 


nnz(ATA) 

Lf^f 0% 
Li,. = 0.0% 


nnz(ATA) 

Lf^f 0% 
Li;. = 0.0% 


sum D(AA) 

L, . 0% 
Lt^ = 0.0% 


Kim 










nnz(ATA) 
L+^f = 0.0% 

Ltr = 0.0% 


sum U(AAUATA) 
Lf^f = 7.8% 

Ltr = 10.1% 


sum U(AUTAA) 
Lv-.v- = 5.0% 
Ltr = 5.6% 


sum D(AADAA) 
Lf,f = 4.0% 
L^^ = 4.6% 


Vazquez 












nnz(ATA) 
Li^i = 0.0% 

Ltr = 0.0% 


nnz(ATA) 
Li^i = 0.0% 

^LSL yj .yj /yj 

Ltr = 0.0% 


nnz(AA) 
L+^-t = 0.0% 

-'-^ZSZ yj .yj /y/ 

Ltr = 0.0% 


Erdos 














sum D(AADATA) 
Lt.t = 2.5% 
Ltr = 2.8% 


nnz(AA) 
. = 0.0% 

Ltr = 0.0% 


Grindrod 
















nnz(AA) 
Lt,t = 0.0% 
Ltr = 0.0% 


MZ 



















Table 5: Most discriminative w^ords for the E. coli training data based on low^est test loss by 1 -dimensional splitting for every pair of 
models. Ltst is the test loss and Ltr the training loss. 



RANKING 


WORD 




ASSOCIATED SUBGRAPHS 


1 


sum U(ATA) 


5.8% 








2 


nnz D(AUTA) 


2.4% 




\ 




3 


sum D(AADAA) 


1.7% 






h 




r\ 










4 


nnzU(AUAUTAUTA) 


1.4% 








P. 








a 

I 
























5 


nnz D(AUTAUAUTAUA) 


1.3% 




.1 


















V' 




6 


nnz U(ADAUTA) 


1.2% 




I 




\ 




7 


sum D(AUTAUTAUTA) 


1.2% 




/\ 




h 




8 


nnz U(AAUA) 


1.1% 








1. 








\ 




{ 














9 


sum(AUTA) 


1.1% 












10 


sumU(ADAUADAUTA) 


1.0% 














i' 




} 

i 




\ 





Table 6: Ranking of words found by binary pairwise trees for the E. coli training data. L^^ for a 
word ranked n is the average training loss over ^^pairwise trees, where every tree has depth n and 
splits the data using words 1 to n in the given order. 





MZ 


Grindrod 


Kim 


Erdos 


Kumar 


Krapivsky-Bianconi 


Vazquez 


Krapivksy 


MZ 




sum(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(AAAAUA) 
Ltst = 3.8% 
Ltr = 4.3% 


sum(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


sum D(AATA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 

Ltr = 0.0% 


sum D(AATA) 
Ltst = 0.0% 

Ltr = 0.0% 


Grindrod 






sum(ATADATA) 
Ltst = 3.8% 
Ltr = 5.1% 


sum D(AAUATA) 
Ltst = 1.5% 
Ltr = 1.3% 


nnz D(AA) 
Ltst = 0.0% 
L^^ = 0.0% 


sum(AA) 
Ltst = 0.0% 

Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


sum(AA) 
= 0.0% 

Ltr = 0.0% 


Kim 








sum D(ATAUATA) 
Ltst = 1.0% 
Ltr = 2.3% 


nnz D(AA) 

= 0.0% 

Ltr = 0.0% 


nnz D(ATA) 

Ltst = 0.0% 
= 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(ATA) 

Ltst = 0.0% 
Ltr = 0.0% 


Erdos 










nnz(AA) 
Ltst = 0.0% 
Lt^ = 0.0% 


sum(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Lt,t = 0.0% 

Ltr = 0.0% 


sum(AA) 
Lt,t = 0.0% 
Ltr = 0.0% 


Kumar 












nnz D(AA) 
Ltst = 0.0% 
= 0.0% 


nnz(AA) 
Ltst = 0.0% 

Ltr = 0.0% 


nnz D(AA) 
Ltst = 0.0% 

Ltr = 0.0% 


Krapivsky-Bianconi 














nnz(AA) 
Ltst = 0.0% 

Ltr = 0.0% 


nnz(AA) 
Lt,t = 16.5% 
Ltr = 15.4% 


Vazquez 
















nnz(AA) 
Ltst = 0.0% 

Ltr = 0.0% 


Krapivksy 



















Table 7: Most discriminative words for the C. elegans training data based on lowest test loss by 1 -dimensional splitting for every pair 
of models. Ltst is the test loss and Ltr the training loss. 



RANKING 


WORD 


Ltr 


ASSOCIATED SUBGRAPHS 


1 


suniD(AAUAUAA) 


3.6% 




Vv 

ij — ii 




r 








2 


irnz D(AUAUA) 


1.1% 








3 


sumU(AUTA) 


0.8% 








4 


sum D(AUAUTAUAUTA) 


0.6% 








r 1^ 
















5 


nnzU(AUA) 


0.5% 




* 




6 


sum D(AA) 


0.5% 




I 




7 


sumU(AUTADAUAUA) 


0.4% 








( i 












8 


nnz U(ADAUTA) 


0.4% 




C) 

V 




\ 

f 

u 




9 


nnzD(AUADATAUA) 


0.4% 




i\ 













Table 8: Ranking of words found by binary pairwise trees for the C. elegans training data. Ltr for 
a word ranked n is the average training loss over all pairwise trees, where every tree has depth n 
and splits the data using words 1 to n in the given order. 
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votes 


MZ 


Kim 


Grindrod 


Krapivsky-Bianconi 


Krapivksy 


Erdos 


Kumar 


Vazquez_K5 


MZ 


111 




/(x) = 1.82 


/(x) = 0.49 


/(x) = 3.19 


/(x) = 2.28 


/(x) = 1.18 


fix) = 0.91 


fix) = 1.25 








Ltst = 0.0% 


Lt,t = 0.0% 


Lt^t = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 








Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 








iV,^,=48 


iV,„=Il 


Nsv=4S 


iV,„=37 


Nsv=lO 


Nsv=S 




Kim 


611 


/(x) = -1.82 




/(x) = 0.99 


/(x) = 0.63 


/(x) = 0.06 


/(x) = 16.99 


fix) = 1.25 


fix) = 1.25 






Lut = 0.0% 




L(«( = 2.5% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 3.5% 


Ltst = 0.0% 


Lt,t = 0.0% 






= 0.0% 




Ltr = 2.8% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 4.9% 


Ltr = 0.0% 


Ltr = 0.0% 












Nsv=l4- 


Nsy=2l 


Nsv=294- 


Nsy=l2 


Ns^=4 


Grindrod 


5/7 


/(x) = -0.49 


/(x) = -0.99 




/(x) = 0.46 


fix) = 0.39 


fix) = 55.68 


/(x) = 7.88 


/(x) = 3.87 






Ltst = 0.0% 


Ltst = 2.5% 




Lt.t = 0.0% 


Ltst = 0.0% 


Ltst = 2.0% 


Ltst = 0.0% 


Ltst = 0.0% 






Lj^ = 0.0% 


Ltr = 2.8% 




Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 1.6% 


Ltr = 0.0% 


Ltr = 0.0% 








iV.«=165 




Nsv=S 




A^s.=110 




Nsv=3 


Krapivsky-Bianconi 


4/7 


/(x) = -3.19 


/(x) = -0.63 


/(x) = -0.46 




/(x) = 0.44 


fix) = 0.10 


/(x) = 0.25 


fix) = 0.58 






Ltst = 0.0% 


Ltst = 0.0% 


Lt,t = 0.0% 




Ltst = 6.5% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 






= 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 




Ltr = 6.8% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 






A^a„=48 


A^.z,=14 






iV.^=572 


Nsy=4 


Nsv=9 


Nsv=S. 


Krapivksy 


2/7 


/(x) = -2.28 


/(x) = -0.06 


/(x) = -0.39 


/(x) = -0.44 




fix) = 0.23 


fix) = -0.00 


fix) = 0.15 






Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 6.5% 




Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 






= 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 6.8% 




Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 






Nsv=31 






Nsv=512 




Nsv=6 


A^.„=10 


Nsv=^ 


Erdos 


2/7 


/(x) = -1.18 


/(x) = -16.99 


/(x) = -55.68 


/(x) = -0.10 


/(x) = -0.23 




fix) = 5.99 


fix) = 1.17 






Ltst = 0.0% 


Ltst = 3.5% 


Xt,t = 2.0% 


Ltst = 0.0% 


Lt.t = 0.0% 




Ltst = 0.0% 


Ltst = 0.0% 






= 0.0% 


Ltr = 4.9% 


Ltr = 1.6% 


Ltr = 0.0% 


Ltr = 0.0% 




L(r = 0.0% 


Ltr = 0.0% 






A^a„=10 


Af.z,=294 


Af.t,=110 


Nsy=4 


Nsv=6 




Nsv=6 


Nsv=4 


Kumar 


2/7 


/(x) = -0.91 


/(x) = -1.25 


/(x) = -7.88 


/(x) = -0.25 


/(x) = 0.00 


fix) = -5.99 




fix) = 168.96 






Ltst = 0.0% 


Ltst = 0.0% 


L(«( = 0.0% 


Ltrt = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 




Ltst = 0.0% 






Lt^ = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 




Ltr = 0.0% 






Nsy=8 


A^.^,=12 






Nsv=lO 


Nsv=6 




Nsv=4 


Vazquez_K5 


0/7 


/(x) = -1.25 


/(x) = -1.25 


/(x) = -3.87 


/(x) = -0.58 


/(x) = -0.15 


fix) = -1.17 


fix) = -168.96 








Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Lt.t = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 








Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 








Nsv=5 






Nsv=S 


Nsv=S 


Nsv=4 


Nsv=4 





Table 9: SVM results for C. elegans. /(x) = w • xce/e^/ans + b, Ltst is the test loss, Ltr the training loss and A^^^ the number of support 
vectors. Results are shown for SVMs trained between every pair of models, if f{x) > C. elegans is classified as the row-header, if 
f{x) < as the column-header. 



RANKING 


WORD 


Ltr 


ASSOCIATED SUBGRAPHS 


1 


sum U(ATADAA) 


0.090% 








A, 

« fi* 'f* 




2 


nnz D(ATAUAAA) 


0.030% 








6 CK-i 




u 




3 


nnz D(AA) 


0.019% 




H 
h 




4 


nnz(ADATAUAA) 


0.016% 




} 




D \ 




Iv 

■J n 












5 


miz(ATAUAUAA) 


0.014% 




\ 

! 






\ 

u 






oi 












\ 
4 


























b 










: 


































6 


sum D(AAUAA) 


0.013% 




} 




: 

V 




7 


nnz D(ATAUAA) 


0.013% 












8 


sum D(AAAUAA) 


0.012% 








A 




? , 




9 


nnz D(AAUAA) 


0.012% 




x 

i 




V 




10 


sum(ADAAUAA) 


0.012% 




} 




h 




V 








y 




4^ 





Table 10: Ranking of words found by binary pairwise trees for the S. cerevisiae training data. Ltr 
for a word ranked n is the average training loss over all pairwise trees, where every tree has depth 
n and splits the data using words 1 to n in the given order. 
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votes 


Sole 


Callaway 


Flammini 


Vazquez 


Kim 


Grindrod sym 


Barabasi 


Erdos 


Klemm 


Small World 


Bianconi 


Sole 


12/10 




/(x) = 8.57 


/(x) = 4.67 


/(x) = 3.67 


/(x) = 19.25 


/(x) = 10.41 


/(x) = 1.75 


/(x) = 13.12 


/(x) = 4.73 


/(x) = 8.56 


/(x) = 1.77 








Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 1.2% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 








Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 1.2% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 








Nsv=2S 


A^s^;=36 


Nsv='i^O 


A^st;=306 


A^s^=20 


Nsv=i6 


A^s^=14 


A^s^=41 


A^s.;=14 


A^s.;=15 


Callaway 


11/10 


/(x) = -8.57 




/(x) = 0.27 


/(x) = 0.44 


/(x) = 0.37 


/(x) = 0.76 


/(x) = 0.86 


/(x) = 0.96 


/(x) = 0.57 


/(x) = 0.76 


/(x) = 0.95 






Ltst = 0.0% 




Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 






Ltr = 0.0% 




Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 






Ns v=2S 




Nsv='7 


Nsv=2 


A^s^=48 


A^s^=4 


A^s^=10 


A^s^=3 


A^s^=13 


A^s^=4 


A^s^=ll 


Flammini 


9/10 


/(x) = -4.67 


/(x) = -0.27 




/(x) = -0.86 


/(x) = 0.32 


/(x) = 7.52 


/(x) = 0.17 


/(x) = 0.52 


/(x) = 2.38 


/(x) = 4.25 


/(x) = 0.33 






Ltst = 0.0% 
Ltr = 0.0% 


Ltst = 0.0% 
Ltr = 0.0% 




Ltst = 0.0% 
Ltr = 0.0% 


Ltst = 0.0% 
Ltr = 0.1% 


Ltst = 6.0% 
Ltr = 3.8% 


Ltst = 0.0% 
Ltr = 0.0% 


Ltst = 0.0% 
Ltr = 0.0% 


Ltst = 0.8% 
Ltr = 0.8% 


Ltst = 7.2% 
Ltr = 7.6% 


Ltst = 0.0% 
Ltr = 0.0% 






Nsv=^6 


Nsv=^ 




Nsv=29 


A^s^=94 


Ns v=529 


Nsv='i^O 


A^s^;=30 


A^s^=147 


A^s^=384 


A^s^=14 


Vazquez 


9/10 


/(x) = -3.67 
Ltst = 0.0% 
Ltr = 0.0% 


/(x) = -0.44 
Ltst = 0.0% 
Ltr = 0.0% 


/(x) = 0.86 
Ltst = 0.0% 
Ltr = 0.0% 




/(x) = 0.35 
Ltst = 0.0% 
Ltr = 0.0% 


/(x) = -0.12 
Ltst = 0.0% 
Ltr = 0.0% 


/(x) = 0.17 
Ltst = 0.0% 
Ltr = 0.0% 


/(x) = 0.95 
Ltst = 0.0% 
Ltr = 0.0% 


/(x) = 3.39 
Ltst = 0.0% 
Ltr = 0.0% 


/(x) = 0.47 
Ltst = 0.0% 
Ltr = 0.0% 


/(x) = 0.23 
Ltst = 0.0% 
Ltr = 0.0% 






Nsv='^0 


Nsv=2 


Nsv=29 




Nsv=20 


Nsv='^5 


A^s^=8 


Nsv=^ 


A^s^=23 


A^s^=5 


A^sz;=8 


Kim 


7/10 


/(x) = -19.25 


/(x) = -0.37 


/(x) = -0.32 


/(x) = -0.35 




/(x) = -1.29 


/(x) = 1.41 


/(x) = 4.55 


/(x) = 1.15 


/(x) = 5.60 


/(x) = 1.44 






Ltst = 1-2% 
Ltr = 1-2% 

A/'sv=306 


Ltst = 0.0% 
Ltr = 0.0% 

A^s^=48 


Ltst = 0.0% 
Ltr = 0.1% 

A^s^;=94 


Ltst = 0.0% 
Ltr = 0.0% 

A^s^;=20 




Ltst = 1.5% 
Ltr = 1-4% 
Ars^=107 


Ltst = 0.0% 
Ltr = 0.0% 
Ns v=26 


Ltst = 12.2% 
Ltr = 16.7% 
Ns v=603 


Ltst = 0.0% 
Ltr = 0.0% 
A^s^=55 


Ltst = 0.2% 
Ltr = 0.4% 

A^S7;=309 


Ltst = 0.0% 
Ltr = 0.0% 

A^sv=24 


Grindrod sym 


7/10 


/(x) = -10.41 
Ltst = 0.0% 
Ltr = 0.0% 
Nsv=20 


/(x) = -0.76 
i-tst = 0.0% 
Ltr = 0.0% 
Nsv=^ 


/(x) = -7.52 
i-tst = 6.0% 
Ltr = 3.8% 
Ns v=529 


/(x) = 0.12 
Ltst = 0.0% 
Ltr = 0.0% 

Nsv='i-5 


/(x) = 1.29 
Ltst = 1.5% 
Ltr = 1-4% 

A/'st;=107 




/(x) = -0.10 
Ltst = 0.0% 
Ltr = 0.0% 
Nsv='7 


/(x) = 3.10 
Ltst = 1.2% 
Ltr = 0.9% 
Nsv=66 


/(x) = 2.75 
Ltst = 0.0% 
Ltr = 0.0% 

A7's-u=44 


/(x) = -2.11 
Ltst = 10.5% 
Ltr = 10.1% 
Nsv=291 


/(x) = 0.08 
Ltst = 0.0% 
Ltr = 0.0% 

A/'s-i;=ll 


Barabasi 


6/10 


/(x) = -1.75 


/(x) = -0.86 


/(x) = -0.17 


/(x) = -0.17 


/(x) = -1.41 


/(x) = 0.10 




/(x) = 0.17 


/(x) = -2.26 


/(x) = 0.37 


/(x) = 2.48 






Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 




Ltst = 0.0% 


Ltst = 3.5% 


Ltst = 0.0% 


Ltst = 2.2% 






Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 
Nsv=26 


Ltr = 0.0% 
Nsv=^ 




Ltr = 0.0% 
Nsv=^ 


Ltr = 5.6% 
iVs^=281 


Ltr = 0.0% 
iVs^=V 


Ltr = 3.0% 
iVs^=lll 


Erdos 


4/10 


/(x) = -13.12 


/(x) = -0.96 


/(x) = -0.52 


/(x) = -0.95 


/(x) = -4.55 


/(x) = -3.10 


/(x) = -0.17 




/(x) = 1.35 


/(x) = 11.16 


/(x) = 0.07 






Ltst = 0.0% 


i-tst = 0.0% 


i-tst = 0.0% 


Ltst = 0.0% 


Ltst = 12.2% 


Ltst = 1.2% 


Ltst = 0.0% 




Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 






Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 16.7% 


Ltr = 0.9% 


Ltr = 0.0% 




Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 






A^s.;=14 


Nsv=^ 


Nsv=^0 


Nsv='7 


Ars^=603 


Nsv^66 


Nsv='7 




A^st;=24 


A^s^=10 


A^s-u=9 


Klemm 


2/10 


/(x) = -4.73 


/(x) = -0.57 


/(x) = -2.38 


/(x) = -3.39 


/(x) = -1.15 


/(x) = -2.75 


/(x) = 2.26 


/(x) = -1.35 




/(x) = -4.53 


/(x) = 2.14 






Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.8% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 3.5% 


Ltst = 0.0% 




Ltst = 0.0% 


Ltst = 1.8% 






Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.8% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 5.6% 


Ltr = 0.0% 




Ltr = 0.0% 


Ltr = 0.9% 






A^s^=41 


A^s^=13 


A^s^=147 


Nsv=23 


Ns v=55 


A^s^=44 


A^s^=281 


A^s^=24 




A^s^=33 


Ars^=106 


Small World 


2/10 


/(x) = -8.56 


/(x) = -0.76 


/(x) = -4.25 


/(x) = -0.47 


/(x) = -5.60 


/(x) = 2.11 


/(x) = -0.37 


/(x) = -11.16 


/(x) = 4.53 




/(x) = -0.02 






Ltst = 0.0% 


Ltst = 0.0% 


Ltst = r.2% 


Ltst = 0.0% 


Ltst = 0.2% 


Ltst = 10.5% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 




Ltst = 0.0% 






Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 7.6% 


Ltr = 0.0% 


Ltr = 0.4% 


Ltr = 10.1% 


Ltr = 0.0% 


Ltr = 0.0% 


Ltr = 0.0% 




Ltr = 0.0% 








A^s^=4 


A^s^=384 


Nsv=5 


A^s^=309 


A^s^=297 


A^s^=V 


Nsv=iO 


A^s^=33 




A/'s-u=9 


Bianconi 


1/10 


/(x) = -1.77 


/(x) = -0.95 


/(x) = -0.33 


/(x) = -0.23 


/(x) = -1.44 


/(x) = -0.08 


/(x) = -2.48 


/(x) = -0.07 


/(x) = -2.14 


/(x) = 0.02 








Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 0.0% 


Ltst = 2.2% 


Ltst = 0.0% 


Ltst = 1.8% 


Ltst = 0.0% 








Ltr =0.0% 
iVs^=15 


Ltr =0.0% 
iVst;=ll 


Ltr = 0.0% 
Nsv=l4 


Ltr =0.0% 


Ltr =0.0% 
iVs^=24 


Ltr = 0.0% 
iVs^=ll 


Ltr = 3.0% 
iVs^=lll 


Ltr = 0.0% 
iVs^=9 


Ltr = 0.9% 
iVs^=106 


Ltr = 0.0% 
iVs^=9 





Table 11: SVM results for S. cerevisiae (only 11 models out of 13 shown), /(x) = w • :^s.cerevisiae + Ltst is the test loss, Lfr the 
training loss and A^^^ the number of support vectors. Results are shown for SVMs trained between every pair of models, if f{x) > S. 
cerevisiae is classified as the row-header, if f{x) < as the column-header. 





Sole 


Callaway 


Flammini 


Vazquez 


Kim 


Grindrod sym 


Barabasi 


Erdos 


Klemm 


Small World 


Bianconi 


Sole 




nnz(AAAAA) 
Ltst = 0.3% 
Ltr = 0.5% 


nnz(AAAAA) 
Ltst = 3.8% 
Ltr = 2.5% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(ADA) 

Ltst = r.2% 

Ltr = 5.2% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(AA) 
Ltst = 0.0% 
Ltr- = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


Callaway 






nnz(AAAAA) 
Ltst = 0.0% 
Ltr =0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(ADATAUAA) 
Ltst = 3.0% 
Ltr = 5.2% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


Flammini 








nnz D(AAUAA) 
Ltst = 0.0% 
Ltr - 0.1% 


nnz D(AAA) 
Ltst = 13.8% 
Ltr = 11.1% 


nnz U(ATADAAA) 
Ltst = 14.0% 
Ltr = 13.4% 


nnz D(AAUAA) 
Ltst = 0.0% 
Ltr =0.2% 


sum(ADAAA) 
Ltst = 0.0% 
Ltr = 0.1% 


nnz D(AAUAA) 
Ltst = 0.5% 
Ltr = 0.2% 


sum D(AAUAA) 
Ltst = 8.5% 
Ltr = 8.9% 


nnz(AAA) 
Ltst = 0.0% 
Ltr = 0.0% 


Vazquez 










nnz D(AA) 
Ltst = 0.0% 
Ltr =0.0% 


nnz D(ATAUAA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 

Ltr = 0.0% 


nnz D(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


sum D(AAA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 

Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


Kim 












nnz(AAAAA) 
Ltst = 0.8% 
Ltr = 0.6% 


nnz D(AA) 
Ltst = 0.0% 
Ltr =0.0% 


sum U(ATADAA) 
Ltst = 8.0% 
Ltr = 9.2% 


nnz D(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


Grindrod sym 














nnz(AA) 
Ltst = 0.0% 
Ltr =0.0% 


nnz D(ATAUAAA) 
Ltst = 0.3% 
Ltr = 0.1% 


nnz(ADATAUAA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(ATAUAAA) 
Ltst = 18.5% 
Ltr = 19.2% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


Barabasi 
















nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(ATAUAA) 
Ltst = 2.0% 
Ltr =2.7% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


nnz D(ATAUAA) 
Ltst = 0.0% 
Ltr = 0.0% 


Erdos 


















nnz D(AA) 
Ltst = 0.0% 
Ltr =0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr =0.0% 


nnz(AA) 
Ltst = 0.0% 
Ltr = 0.0% 


Klemm 




















nnz D(AA) 
Ltst = 0.0% 
Ltr =0.0% 


nnz D(ATAUAA) 
Ltst = 0.0% 

Ltr = 0.0% 


Small World 






















nnz(AA) 
Ltst = 0.0% 

Ltr = 0.0% 


Bianconi 

























Table 12: Most discriminative words for the S. cerevisiae training data based on lowest test loss by 1 -dimensional splitting for every 
pair of models. Ltst is the test loss and Ltr the training loss. 



